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Abstract: Eighteen classical mutant strains of Neurospora crassa were subject to whole genome sequence 
analysis and the mitochondrial genome is analyzed. Overall, the mitochondrial genomes of the classical mutant 
strains are 99.45 to 99.98 % identical to the reference genome. Two-thirds of the SNPs and three-fourths 
of indels identified in this analysis are shared among more than one strain. Most of the limited variability in 
mitochondrial genome sequence is neutral with regard to protein structure. Despite the fact that the mitochondrial 
genome is present in multiple copies per cell, many of the polymorphisms were homozygous within each 
strain. Conversely, some polymorphisms, especially those associated with large scale rearrangements are 
only present in a fraction of the reads covering each region. The impact of this variation is unknown and further 
studies will be necessary to ascertain if this level of polymorphism is common among fungi and whether it 
reflects the impact of ageing cultures. 
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INTRODUCTION 

Widely regarded as being an endosymbiont of ancient 
protobacterial origin, mitochondria are a defining 
characteristic of eukaryotic organisms (Gray ef al. 1 999). The 
availability of Neurospora strains carrying mutations in the 
mitochondrial genome enabled the first studies of maternal 
inheritance in Neurospora (Mitchell & Mitchell 1952), and 
mitochondrial inheritance has been shown to be primarily 
maternal for Neurospora (Mannella et al. 1979) as well as 
for other filamentous fungi (Griffiths 1996). In rare cases 
mitochondrial genome markers are transmitted by the 
fertilizing cytoplasm or in unstable heterokaryons (Collins and 
Saville 1990). Mitochondrial genome analysis has been used 
both to understand fundamental aspects of evolution (Gray 
et al. 1999) and as a source of markers for population and 
species delimitation (Moore 1995). Some recent analysis of a 
rapidly expanding pool of information has led to re-evaluation 
of some of the assumptions of early studies of mitochondrial 
genetics (Galtierefa/. 2009). Moreover, mitochondrial biology 
has seen a resurgence of interest as degraded mitochondria 
were reported in brain tissue from Alzheimer's (Sultana & 
Butterfield 2009), and Huntington's (Damiano et al. 2010) 
patients. 

Filamentous fungi have been described as providing a 
good model for the study of mitochondrial inheritance and 
biology (Griffiths 1 996). In one instance a fungal mitochondrial 
genome project emphasized high level comparisons and 
used one representative of each major phylogenetic lineage 
(Paquin ef al. 1997). More recent pan-fungal phylogenetic 
analysis, however, did not include mitochondrial markers 



(James et al. 2006) and recent fungal genome analysis 
does not emphasize mitochondrial biology (Martin et al. 
2011), although some authors have described mitochondrial 
genomes as part of their whole genome sequence projects 
(Torriani ef al. 2008). The N. crassa mitochondrial genome 
is 64,800 bases and it encodes twenty-eight protein coding 
genes, as well as two rRNAs and twenty-eight tRNA genes 
(Borkovich etal. 2004). Among these are genes for the electron 
transport chain, subunits of the mitochondrial ATPase, 
protein synthesis, and genes of unknown function. Compared 
to other mitochondrial genomes, the N. crassa mitochondrial 
genome is larger than many, but still near the middle of the 
1 9 to 1 09 Kb range for fungi as well as for the overall range of 
16 to 366 Kb from human to Arabidopsis (Bullerwell & Lang 
2005). The Neurospora mitochondrial genome is a circular 
molecule and it varies somewhat in size depending on the 
presence or absence of optional intron sequences (Griffiths 
1996; Collins & Lambowitz 1983). Additionally, an aberrant 
version of the NADH dehydrogenase was characterized in 
the Neurospora mitochondrial genome (de Vries ef al. 1986) 
and this was ultimately associated with a duplication that 
includes two tRNA genes as well as the mutant version of 
the NADH dehydrogenase subunit 2 (Agsteribbe ef al. 1 989). 
Mitochondrial genome rearrangements were associated 
with the intermittent cessation of growth phenotype known 
as 'stopper' and these rearrangements involved the NADH 
dehydrogenase gene fragment (de Vries ef al. 1986). 
Additionally, while the Neurospora mitochondria has been 
known to harbor various plasmids, the Varkud satellite 
plasmids were recently shown to be phenotypically neutral 
(Keeping & Collins 2011). Other mitochondrial plasmids 
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Table 1. Strains employed in the current analysis. 



FGSC # Gene* Mutagen Genetic background* Reference 



106 


com 


UV 


SL3 


(Perkins & Ishitani 1959) 


305 


amyc 


? 


SL3 


(Atwood & Mukai 1954) 


309 


ti 


X-rays 


SL3 


(Perkins 1959) 


322 


ty-1 


spontaneous 


M 


(Horowitz etal. 1961) 


821 


ts 


spontaneous 


M 


(Nakamura & Egashira 1961) 


1211 


dot 


spontaneous 


SL3 


(Perkins 1962b) 


1303 


fi 


spontaneous 


M 


(Perkins 1962b) 


1363 


smco-1 


Mustard 


L 


(Garnjobst & latum 1967) 


2261 


do 


UV 


SL2 


(Perkins 1962b) 


3114 


Sk-2 


Introgression 


SL 


(Turner & Perkins 1979) 


3246 


fs-n 


spontaneous 


M 


(IVIylyk & Threlkeld 1974) 


3562 


mb-1 


UV 


M 


(Weijer & Vigfusson 1972) 


3564 


mb-2 


UV 


M 


(Weijer & Vigfusson 1972) 


3566 


mb-3 


UV 


M 


(Weijer & Vigfusson 1972) 


3831 


ff-1 


spontaneous 


M 


(Tan & Ho 1970) 


3921 


tng 


spontaneous 


SL2 


(Springer & Yanofsky 1989) 


7022 


fid 


spontaneous 


M 


(Perkins 1962b) 


7035 


per-1 


UV 


SL3 


(Howe & Benson 1974) 



*Gene refers to the genetically characterized locus that was putatively identified by whole genome analysis in IVIcCluskey et al. (2011) 

*ln the genetic background field, SL is used to indicate the reference genome background (St Lawrence) and the following number indicates 
how many generations of backcrosses to a reference strain were carried out. L indicates the Lindegren background while M is used when the 
background is mixed or not documented. 



are known to induce senescence, presumably through 
recombination with the mitochondrial genome (Court et 
al. 1991). Self-splicing introns of the 25S rRNA gene were 
identified in Neurospora mitochondria (Garriga & Lambowitz 
1983) and led to the characterization of the mechanism of 
self splicing of the group I introns in Neurospora (Garriga 
ef al. 1986). Whole genome resequencing has been used 
to analyze the nuclear genome of numerous N. crassa 
classical mutant strains (McCluskey ef al. 2011) and that 
dataset provides unprecedented insight into Neurospora 
mitochondrial genetics and biology. Because most whole 
genome data includes mitochondrial sequence it is likely that 
analysis of mitochondrial genomes will be available for many 
fungal taxa and this suggests a renaissance of interest in 
mitochondrial genetics in fungi. 

MATERIALS AND METHODS 

Total DNAfrom Neurospora strains (Table 1) was prepared 
as described (McCluskey ef al. 2011). Most strains were 
preserved on anhydrous silica gel (Perkins 1962a) since their 
original deposit into the FGSC collection without multiple 
passages. For example, strain FGSC 1303 was preserved in 
1966 and strain FGSC 1363 was preserved in 1967. Some of 
these strains have morphological abnormalities and for these 
strains, the cultures were macerated with sterile glass tissue 
grinder and resuspended in fresh culture medium to allow 
production of enough tissue for DNA extraction. Genome 



sequencing was carried out at the US DOE JGI using the 
lllumina platform as described (McCluskey ef al. 2011). 

SNP and indel analysis was carried out using the MAQ 
software platform, version 0.7.1 (Li et al. 2008) . Larger indels 
and rearrangements were assessed using Breakdancer (Chen 
ef al. 2009b). Comparative analysis of polymorphisms was 
carried out as previously described (McCluskey ef al. 2011). 

RESULTS 

Among all the resequenced strains 129 single nucleotide 
variants (SNV) occurring at 67 different positions in the 
mitochondrial genome were detected. Of these, 48 were found 
in only one strain each while nineteen were found in two or 
more strains (Table 2). Two variants were present in fourteen 
and seventeen strains respectively. The SNV found in fourteen 
strains is a C to G at position 2,246 in non-coding sequence. 
The SNV found in seventeen strains occurs at position 17,478 
just downstream from the mitochondrial ribosomal protein 
S5 (S3). All of the SNVs are non-coding except one that 
encodes a synonymous substitution in NCU16015 in strain 
FGSC 821 (Table 3). One strain, FGSC 3566, had the most 
SNVs in its mitochondrial genome, with 36 SNVs, of which 23 
are unique to this strain. With the exception of the C to G at 
position 17,478 all of the SNVs in this strain are ambiguous 
with alternate bases making up 2 to 49 % of reads. In every 
case, among these variants in strain 3566 the primary call at 
each variant site was identical with the reference genome. At 
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Table 2. Number of mitochondrial polymorphisms in each of 18 strains of Neurospora crassa. 



Strain 


SNP 


Indel 


CDS indel 


106 


2 


14 


0 


305 


4 


65 


5 


309 


3 


23 


0 


322 


7 


83 


5 


821 


6 


133 


8 


1211 


4 


69 


4 


1303 


4 


53 


4 


1363 


Q 


11 


0 


2261 


2 


10 


0 


3114 


4 


51 


2 


3246 


3 


28 


1 


3562 


6 


27 


1 


3564 


3 


70 


4 


3566 


36 


322 


21 


3831 


16 


153 


10 


3921 


3 


18 


2 


7022 


11 


92 


3 


7035 


9 


29 


1 



> 

30 



CDS indels occur within the coding sequence of an ORF and are also included among the total count of indels. 



the other extreme, several strains had fewer than three or four 
SNVs and all of these strains included the C to G mutation 
at position 17,478. Strains FGSC 106 and FGSC 2261 each 
had only two SNVs and these were both shared and had no 
significant alternate base calls. 

A total of 1 ,250 insertions and deletions were identified 
among the strains. These occur as 553 different unique 
changes relative to the reference genome. These occur at 
475 positions and of all of the independent iterations of all 
indels, 1,080 were annotated as being homo-allelic while 
170 were identified as multi allelic (that is, different reads 
were recovered for the same location in one strain). In total, 
662 deletions and 588 insertions were characterized. Three 
hundred and twenty-five indels occur only once in the dataset 
while 228 indels occur among two to eighteen strains. Sixty- 
six sites have two different variants (insertions or deletions of 
a different base, or of a different number of bases) and 4 sites 
have 3 variants. 

One position with an indel in all eighteen strains occurs 
at position 12,228. This position, falling in intergenic space 
between the full-length NADH Dehydrogenase (NCU16004) 
and the mitochondrial ribosomal protein S5 (NCU 16005), has 
sixteen deletions of one T and two insertions of one T and 
these all occur adjacent to a stretch of nine Ts. Most of the 
indels that are found in multiple strains occur among stretches 
of five or more repeats of the same base as the specific indel. 

Among indels occurring in gene coding sequence the 
indel at 1,532 (NCU 16002), is seen in fourteen strains. The 
deletion of one A from this position is homoallelic and strongly 
supported in thirteen strains, while the addition of one A is 
less well supported in strain FGSC 3246. Four strains 
are identical with the reference genome at this position. 
The deletion at 1,532 causes numerous stop codons in 



the NCU 16002 ORF, beginning with a TAG at amino acid 
residue 203, which removes 121 residues from the full length 
conserved hypothetical protein encoded at NCU16002. In 
strains FGSC 3566 and 3831 this ORF has additional indels 
including the insertion of GG at position 1356. Position 1481 
has an insertion of one G in strain 3566 and one C in strain 
3831. 

NCU16001 encodes a truncated version of NADH 
dehydrogenase subunit 2, with the full-length version 
encoded by NCU16006. The truncated NCU16001 ORF 
is 705 nucleotides in length and has multiple indels in five 
strains and all of these indels induce frameshift errors. The 
deletion of the C at position 616 is found in strains FGSC 
3114 and FGSC 3566 and is homoallelic in both strains. 
This deletion causes a frameshift and introduces multiple 
stop codons, the first being a TAG codon at triplet 120 of the 
235 amino acid protein. Similarly, the deletion of one G at 
position 630 in strains 322 and 3921 causes a frameshift that 
introduces a stop codon at position 121, as well as multiple 
stops after that position. There are no indels in the full-length 
version of NADH Dehydrogenase subunit 2 (NCU 16006) in 
any of the strains sequenced in this program. 

In all, twenty mitochondrial ORFs have indels (Table 3) 
and of these, nine ORFs have indels within the protein coding 
region of the gene. Eleven ORFs have insertions or deletions 
in an intron and four have indels directly adjacent (3' or 
5') to the ORF. Five ORFs have no insertions or deletions 
and these include the full-length version of the NADH 
dehydrogenase subunit 2 (NCU 16006), two hypothetical 
proteins (NCU16011, NCU16023), and two endonucleases 
(NCU16014and NCU16021). 

Seventy-two larger rearrangements with both endpoints 
within the mitochondrial genome were detected among 
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Table 3. Mitochondrial open reading frames (ORF) with polymorphisms relative to the reference genome. 



I— 
< 



ORF 

SNPs 

NCU16007 
NCU 16008 
NCU 16009 
NCU16012 
NCU16015 
INDELS 

NCU16001 
NCU 16002 
NCU 16003 
NCU 16004 
NCU16005 
NCU16007 
NCU 16008 
NCU 16009 
NCU16010 
NCU16012 
NCU16013 
NCU16015 
NCU16016 
NCU16017 
NCU16018 
NCU16019 
NCU 16020 
NCU16022 
NCU 16024 
NCU16025 



Name 

NADH dehydrogenase subunit 3 
NADH dehydrogenase subunit 4L 
hypothetical protein 
NADH dehydrogenase subunit 5 
laglidadg endonuclease 

NADH dehydrogenase subunit 2 
conserved hypothetical protein 
cytochrome c oxidase subunit 3 
NADH dehydrogenase subunit 6 
mitochondrial ribosomal protein S5 (S3) 
NADH dehydrogenase subunit 3 
NADH dehydrogenase subunit 4L 
hypothetical protein 
laglidadg endonuclease 
NADH dehydrogenase subunit 5 
cytochrome b 
laglidadg endonuclease 
cytochrome c oxidase subunit 1 
hypothetical protein 
NADH dehydrogenase subunit 1 
group I intron endonuclease 
NADH dehydrogenase subunit 4 
hypothetical protein 
ATPase subunit 8 
ATPase subunit 6 



Type 

Intron SNPs 
Intron SNPs 
Intron SNPs 
Intron SNPs 
5' and CDS SNPs 

FS 
FS 
FS 
FS 
FS 

int, 5', 3' 

int, 5' 

int 

int 

int 

int 

FS 

5', 3' 

FS 

int 

FS 

int 

FS 

int 

int 



*FS = frameshift inducing indel, int = intron indel 



these 18 strains using Breakdancer (Chen ef a/. 2009a). An 
additional 37 rearrangements have one endpoint on a 
chromosome in the nuclear genome. All of the polymorphisms 
detected with Breakdancer are unique although nineteen 
have shared endpoints with another variant. Of these, all 
consisted of different variants within one strain with one 
shared endpoint. Twenty-five of the polymorphisms detected 
with Breakdancer were deletions while twelve were insertions. 
The average deletion was 2,925 bases, although three 
putative deletions of over 20 Kb were identified in different 
stains. The average insertion was 123 bases with a range of 
96 to 159 bases. 



DISCUSSION 

Overall there is a very low level of SNVs in the mitochondrial 
genomes of the eighteen strains characterized by whole genome 
sequence analysis. Even the strain with the most SNVs, FGSC 
3566, had only 36 SNVs and most of these were at positions 
where both the reference genome base and an alternate base 
were detected. Interestingly, several of the strains characterized 
in the present study are related to those used in the pioneering 



work clearly showing uniparental inheritance of Neurospora 
mitochondria (Mannella et al. 1979). Strain FGSC 821 was 
deposited as a spontaneous mutant arising in strain 4A, which 
is the designation used for the Abbott strain in Mannella ef al. 
(1979). In this earlier work, Abbott strains were described as 
mitochondrial genome type I. Similarly, strains in the Lindegren 
and St Lawrence backgrounds are described as having type 
II mitochondria. On the deposit form submitted with the strain, 
FGSC 1363 was explicitly listed as being in a Lindegren 
background. Other strains in the current analysis were 
backcrossed into the St Lawrence background (for example, 
FGSC 7035). The possibility that the strains in the current study 
contain the same mitochondrial genome as those described in 
Mannella et al (1979) is supported by the presence of the G 
for C SNP at position 2,246 in both St Lawrence type genome 
(FGSC 7035) and the Lindegren derived strain (FGSC 1 363) 
as well as thirteen additional strains, but not in strain FGSC 821 
(the Abbott strain). 

Two of the indels in NCU16001 (the truncated NADH 
dehydrogenase subunit 2) occur in multiple strains and are 
well supported although both of these indels occur in short 
strings of the same base. NCU16002 encodes a conserved 
hypothetical protein and has multiple unique and shared indels 
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including the second most common indel in the mitochondrial 
genome among these strains. The deletion of one A from 
position 1,532 in this ORF removes 121 amino acids from the 
final putative protein product. This ORF, also known as ufILM 
(D'Souza et al. 2005), has little orthology to other proteins in 
the PUBMED NR protein database, and has no conserved 
protein domains. The finding of these frameshift inducing 
indels suggests that these two genes are both pseudogenes 
resulting from an ancestral partial duplication within the 
mitochondrial genome (Agsteribbe et al. 1989). While many 
mitochondrial ORFs have indels, these do not follow the same 
pattern of bias towards indels that do not disrupt the reading 
frame as was seen for indels in nuclear genes (McCluskey 
ef al. 2011). Although the observation of the same indel in 
multiple strains lends credence to the fact that they are an 
accurate representation of the underlying sequence, indels 
are commonly seen occurring in runs of the same base and 
it cannot be determined from these data whether these are 
changes in the mitochondrial genome or systematic errors 
in the sequencing process. Although intrachromosomal 
rearrangements have been previously implicated as being 
responsible for the start-stop growth phenotype of so called 
stopper mutants, the rearrangements found in the present 
study do not correspond to those described for the stopper 
E35 mutant (deVries etal. 1986). Indeed, the rearrangements 
typically only comprise a fraction of the reads for a given 
region. The anomalous characterization of interchromosomal 
recombination between nuclear and mitochondrial genomes 
by the Breakdancer program suggests either artifacts from 
library construction or in silica in the subsequent analysis. 
The possibility that mitochondrial sequences are found in 
the nuclear genome or that nuclear sequence is present in 
the mitochondrial genome is impossible to assess without 
additional investigation. 

While a traditional view of the mitochondria is that of 
individual cell-like organelles (Luck 1963), recent study 
suggests more of a filamentous or syncytial structure 
(Bowman et al. 2009) with the mitochondrial DNA organized 
into nucleoids (Gilkerson etal. 2008, Basse 2010). Moreover, 
recent analysis of the mitochondrial proteome is adding to 
the understanding of the role of nuclear and mitochondrial 
encoded genes (Keeping et al. 2011). While it may be 
attractive to suggest that the deleterious mutations detected 
in a fraction of the reads in the whole genome sequencing 
of Neurospora strains represent defective mitochondrial 
genomes present in an otherwise healthy background, the 
present level of analysis does not allow this conclusion. The 
fact that most of the indels were homoallelic contrasts markedly 
with the observation that most of the SNVs were multiallelic. 
By way of contrast, larger scale rearrangements detected by 
the Breakdancer algorithm were mostly multiallelic. Whether 
these observations provide insight into fundamental aspects 
of mitochondrial genome maintenance cannot be determined 
with the present dataset. Additional experiments, for example 
comparing sequence from freshly germinated conidia to that 
generated from stationary-phase cultures, may allow insight 
into the nature of these polymorphisms. Future studies may 
take advantage of the information presented here to, for 
example, amplify unique DNA fragments only generated by 
deletions or large-scale rearrangements. Recent advances in 



whole genome sequencing may enable experimental analysis 

of mutation and rearrangements of mitochondrial genome in ^ 

Neurospora, other fungi, and indeed all organisms. ^ 

r> 
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